245 research outputs found
Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding
We present a deep learning framework for probabilistic pixel-wise semantic
segmentation, which we term Bayesian SegNet. Semantic segmentation is an
important tool for visual scene understanding and a meaningful measure of
uncertainty is essential for decision making. Our contribution is a practical
system which is able to predict pixel-wise class labels with a measure of model
uncertainty. We achieve this by Monte Carlo sampling with dropout at test time
to generate a posterior distribution of pixel class labels. In addition, we
show that modelling uncertainty improves segmentation performance by 2-3%
across a number of state of the art architectures such as SegNet, FCN and
Dilation Network, with no additional parametrisation. We also observe a
significant improvement in performance for smaller datasets where modelling
uncertainty is more effective. We benchmark Bayesian SegNet on the indoor SUN
Scene Understanding and outdoor CamVid driving scenes datasets.Toyota Corporatio
A Comparison and Strategy of Semantic Segmentation on Remote Sensing Images
In recent years, with the development of aerospace technology, we use more
and more images captured by satellites to obtain information. But a large
number of useless raw images, limited data storage resource and poor
transmission capability on satellites hinder our use of valuable images.
Therefore, it is necessary to deploy an on-orbit semantic segmentation model to
filter out useless images before data transmission. In this paper, we present a
detailed comparison on the recent deep learning models. Considering the
computing environment of satellites, we compare methods from accuracy,
parameters and resource consumption on the same public dataset. And we also
analyze the relation between them. Based on experimental results, we further
propose a viable on-orbit semantic segmentation strategy. It will be deployed
on the TianZhi-2 satellite which supports deep learning methods and will be
lunched soon.Comment: 8 pages, 3 figures, ICNC-FSKD 201
Knowledge Distillation for Multi-task Learning
Multi-task learning (MTL) is to learn one single model that performs multiple
tasks for achieving good performance on all tasks and lower cost on
computation. Learning such a model requires to jointly optimize losses of a set
of tasks with different difficulty levels, magnitudes, and characteristics
(e.g. cross-entropy, Euclidean loss), leading to the imbalance problem in
multi-task learning. To address the imbalance problem, we propose a knowledge
distillation based method in this work. We first learn a task-specific model
for each task. We then learn the multi-task model for minimizing task-specific
loss and for producing the same feature with task-specific models. As the
task-specific network encodes different features, we introduce small
task-specific adaptors to project multi-task features to the task-specific
features. In this way, the adaptors align the task-specific feature and the
multi-task feature, which enables a balanced parameter sharing across tasks.
Extensive experimental results demonstrate that our method can optimize a
multi-task learning model in a more balanced way and achieve better overall
performance.Comment: We propose a knowledge distillation method for addressing the
imbalance problem in multi-task learnin
Can ground truth label propagation from video help semantic segmentation?
For state-of-the-art semantic segmentation task, training convolutional
neural networks (CNNs) requires dense pixelwise ground truth (GT) labeling,
which is expensive and involves extensive human effort. In this work, we study
the possibility of using auxiliary ground truth, so-called \textit{pseudo
ground truth} (PGT) to improve the performance. The PGT is obtained by
propagating the labels of a GT frame to its subsequent frames in the video
using a simple CRF-based, cue integration framework. Our main contribution is
to demonstrate the use of noisy PGT along with GT to improve the performance of
a CNN. We perform a systematic analysis to find the right kind of PGT that
needs to be added along with the GT for training a CNN. In this regard, we
explore three aspects of PGT which influence the learning of a CNN: i) the PGT
labeling has to be of good quality; ii) the PGT images have to be different
compared to the GT images; iii) the PGT has to be trusted differently than GT.
We conclude that PGT which is diverse from GT images and has good quality of
labeling can indeed help improve the performance of a CNN. Also, when PGT is
multiple folds larger than GT, weighing down the trust on PGT helps in
improving the accuracy. Finally, We show that using PGT along with GT, the
performance of Fully Convolutional Network (FCN) on Camvid data is increased by
on IoU accuracy. We believe such an approach can be used to train CNNs
for semantic video segmentation where sequentially labeled image frames are
needed. To this end, we provide recommendations for using PGT strategically for
semantic segmentation and hence bypass the need for extensive human efforts in
labeling.Comment: To appear at ECCV 2016 Workshop on Video Segmentatio
Geometry meets semantics for semi-supervised monocular depth estimation
Depth estimation from a single image represents a very exciting challenge in
computer vision. While other image-based depth sensing techniques leverage on
the geometry between different viewpoints (e.g., stereo or structure from
motion), the lack of these cues within a single image renders ill-posed the
monocular depth estimation task. For inference, state-of-the-art
encoder-decoder architectures for monocular depth estimation rely on effective
feature representations learned at training time. For unsupervised training of
these models, geometry has been effectively exploited by suitable images
warping losses computed from views acquired by a stereo rig or a moving camera.
In this paper, we make a further step forward showing that learning semantic
information from images enables to improve effectively monocular depth
estimation as well. In particular, by leveraging on semantically labeled images
together with unsupervised signals gained by geometry through an image warping
loss, we propose a deep learning approach aimed at joint semantic segmentation
and depth estimation. Our overall learning framework is semi-supervised, as we
deploy groundtruth data only in the semantic domain. At training time, our
network learns a common feature representation for both tasks and a novel
cross-task loss function is proposed. The experimental findings show how,
jointly tackling depth prediction and semantic segmentation, allows to improve
depth estimation accuracy. In particular, on the KITTI dataset our network
outperforms state-of-the-art methods for monocular depth estimation.Comment: 16 pages, Accepted to ACCV 201
Classical and Quantum Chaos in a quantum dot in time-periodic magnetic fields
We investigate the classical and quantum dynamics of an electron confined to
a circular quantum dot in the presence of homogeneous magnetic
fields. The classical motion shows a transition to chaotic behavior depending
on the ratio of field magnitudes and the cyclotron
frequency in units of the drive frequency. We determine a
phase boundary between regular and chaotic classical behavior in the
vs plane. In the quantum regime we evaluate the quasi-energy
spectrum of the time-evolution operator. We show that the nearest neighbor
quasi-energy eigenvalues show a transition from level clustering to level
repulsion as one moves from the regular to chaotic regime in the
plane. The statistic confirms this
transition. In the chaotic regime, the eigenfunction statistics coincides with
the Porter-Thomas prediction. Finally, we explicitly establish the phase space
correspondence between the classical and quantum solutions via the Husimi phase
space distributions of the model. Possible experimentally feasible conditions
to see these effects are discussed.Comment: 26 pages and 17 PstScript figures, two large ones can be obtained
from the Author
Estimating Depth from RGB and Sparse Sensing
We present a deep model that can accurately produce dense depth maps given an
RGB image with known depth at a very sparse set of pixels. The model works
simultaneously for both indoor/outdoor scenes and produces state-of-the-art
dense depth maps at nearly real-time speeds on both the NYUv2 and KITTI
datasets. We surpass the state-of-the-art for monocular depth estimation even
with depth values for only 1 out of every ~10000 image pixels, and we
outperform other sparse-to-dense depth methods at all sparsity levels. With
depth values for 1/256 of the image pixels, we achieve a mean absolute error of
less than 1% of actual depth on indoor scenes, comparable to the performance of
consumer-grade depth sensor hardware. Our experiments demonstrate that it would
indeed be possible to efficiently transform sparse depth measurements obtained
using e.g. lower-power depth sensors or SLAM systems into high-quality dense
depth maps.Comment: European Conference on Computer Vision (ECCV) 2018. Updated to
camera-ready version with additional experiment
Deep Depth From Focus
Depth from focus (DFF) is one of the classical ill-posed inverse problems in
computer vision. Most approaches recover the depth at each pixel based on the
focal setting which exhibits maximal sharpness. Yet, it is not obvious how to
reliably estimate the sharpness level, particularly in low-textured areas. In
this paper, we propose `Deep Depth From Focus (DDFF)' as the first end-to-end
learning approach to this problem. One of the main challenges we face is the
hunger for data of deep neural networks. In order to obtain a significant
amount of focal stacks with corresponding groundtruth depth, we propose to
leverage a light-field camera with a co-calibrated RGB-D sensor. This allows us
to digitally create focal stacks of varying sizes. Compared to existing
benchmarks our dataset is 25 times larger, enabling the use of machine learning
for this inverse problem. We compare our results with state-of-the-art DFF
methods and we also analyze the effect of several key deep architectural
components. These experiments show that our proposed method `DDFFNet' achieves
state-of-the-art performance in all scenes, reducing depth error by more than
75% compared to the classical DFF methods.Comment: accepted to Asian Conference on Computer Vision (ACCV) 201
XmoNet:a Fully Convolutional Network for Cross-Modality MR Image Inference
Magnetic resonance imaging (MRI) can generate multimodal scans with complementary contrast information, capturing various anatomical or functional properties of organs of interest. But whilst the acquisition of multiple modalities is favourable in clinical and research settings, it is hindered by a range of practical factors that include cost and imaging artefacts. We propose XmoNet, a deep-learning architecture based on fully convolutional networks (FCNs) that enables cross-modality MR image inference. This multiple branch architecture operates on various levels of image spatial resolutions, encoding rich feature hierarchies suited for this image generation task. We illustrate the utility of XmoNet in learning the mapping between heterogeneous T1- and T2-weighted MRI scans for accurate and realistic image synthesis in a preliminary analysis. Our findings support scaling the work to include larger samples and additional modalities
Joint Learning of Intrinsic Images and Semantic Segmentation
Semantic segmentation of outdoor scenes is problematic when there are
variations in imaging conditions. It is known that albedo (reflectance) is
invariant to all kinds of illumination effects. Thus, using reflectance images
for semantic segmentation task can be favorable. Additionally, not only
segmentation may benefit from reflectance, but also segmentation may be useful
for reflectance computation. Therefore, in this paper, the tasks of semantic
segmentation and intrinsic image decomposition are considered as a combined
process by exploring their mutual relationship in a joint fashion. To that end,
we propose a supervised end-to-end CNN architecture to jointly learn intrinsic
image decomposition and semantic segmentation. We analyze the gains of
addressing those two problems jointly. Moreover, new cascade CNN architectures
for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as
single tasks. Furthermore, a dataset of 35K synthetic images of natural
environments is created with corresponding albedo and shading (intrinsics), as
well as semantic labels (segmentation) assigned to each object/scene. The
experiments show that joint learning of intrinsic image decomposition and
semantic segmentation is beneficial for both tasks for natural scenes. Dataset
and models are available at: https://ivi.fnwi.uva.nl/cv/intrinsegComment: ECCV 201
- …